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Abstract 

Scheduling on related machines (Q||Cniax) is one of the most important problems in the 
field of Algorithmic Mechanism Design. Each machine is controlled by a selfish agent and 
her valuation can be expressed via a single parameter, her speed. In contrast to other similar 
problems, Archer and Tardos [4] showed that an algorithm that minimizes the makespan can 
be truthfully implemented, although in exponential time. On the other hand, if we leave out 
the game-theoretic issues, the complexity of the problem has been completely settled — the 
problem is strongly NP-hard, while there exists a PTAS [9, 8]. 

This problem is the most well studied in single-parameter algorithmic mechanism design. 
It gives an excellent ground to explore the boundary between truthfulness and efficient com- 
putation. Since the work of Archer and Tardos, quite a lot of deterministic and randomized 
mechanisms have been suggested. Recently, a breakthrough result [7] showed that a randomized 
truthful PTAS exists. On the other hand, for the deterministic case, the best known approxi- 
mation factor is 2.8 [11, 12]. 

It has been a major open question whether there exists a deterministic truthful PTAS, or 
whether truthfulness has an essential, negative impact on the computational complexity of the 
problem. In this paper we give a definitive answer to this important question by providing a 
truthful deterministic PTAS. 

1 Introduction 

Algorithmic Mechanism Design (AMD) is an area originated in the seminal paper by Nisan and 
Ronen [15, 16] and it has flourished during the last decade. It studies combinatorial optimization 
problems, where part of the input is controlled by selfish agents that are either unmotivated to 
report them correctly, or strongly motivated to report them erroneously, if a false report is prof- 
itable. In classical mechanism design more emphasis has been put on incentives issues, and less 
to computational aspects of the optimization problem at hand. On the other hand, traditional 
algorithm design disregards the fact that in some settings the agents might have incentive to lie. 
Therefore, we end up with algorithms that are fragile against selfish behavior. AMD carries chal- 
lenges from both disciplines, aiming at the design of qualitative algorithms that, at the same time, 
give incentives to selfish users to report truthfully, and so are also immune to strategic behavior. 

A fundamental optimization problem that has been suggested in [16] as a ground to explore 
the design of truthful mechanisms, is the scheduling problem, where a set of n tasks need to be 
processed by a set of m machines. There are two important variants with respect to the processing 
capabilities of the machines, that have been studied within the AMD framework. The machines 
can be unrelated, i.e., each machine i needs tij units of time to process task j; or related, where 
machine i comes with a speed Si, while task j has processing requirement pj, that is, tij = Pj/si (we 
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will use the settled notation QUCmax to refer to the latter problem). The objective is to allocate 
the jobs to the machines so that the maximum finish time of the machines, i.e., the makespan is 
minimized. 

In the game-theoretic setting, it is assumed that each machine z is a rational agent who con- 
trols the private values of row tj. It is further assumed that each machine wants to minimize its 
completion time, and without any incentive it will lie, if this can trick the algorithm to assign less 
work to him. In order to motivate the machines to cooperate, we pay them to execute the tasks. 
A mechanism consists of two parts: an allocation algorithm that assigns the tasks to the machines, 
and a payment scheme that compensates the machines in monetary terms. We arc interested in 
devising truthful mechanisms in dominant strategics, where each player maximizes his utility by 
telling the truth, regardless of the reports of the other players. 

The scheduling problem provides an excellent framework to study the computational aspects of 
truthfulness. It is a well-studied problem from the algorithmic perspective with a lot of algorithmic 
techniques that have been developed. Moreover, it is conceptually close to combinatorial auctions, 
so that solutions and insights can be transferred from the one problem to the other. Indeed, the 
scheduling problem comes with a variety of objectives to be optimized, that are different than the 
objectives used in classical mechanism design. 

From the traditional algorithmic point of view, the computational complexity of the related 
machines case problem is completely settled: There is a polynomial time approximation scheme 
(PTAS) [9] for an arbitrary number of machines, and an FPTAS [10] when the number of machines 
is fixed. The general case is strongly NP-complete, so we don't expect to find an FPTAS unless 
P=NP. 

The mechanism design version of scheduling on related machines was first studied by Archer 
and Tardos [4]. It is the most central and well-studied among single-parameter problems, where 
each player controls a single real value and his objective is proportional to this value (see Chapters 
9 and 12 of [14] for a precise definition). Myerson [13] gave a characterization of truthful algorithms 
for one-parameter problems, in terms of a monotonicity condition. Archer and Tardos [4] found 
a similar monotonicity characterization, and using it they showed that a certain type of optimal 
allocation is monotone and consequently truthful (albeit exponential-time). 

The fact that truthfulness does not exclude optimality, in contrast to the multi-parameter 
variant of scheduling (the unrelated case)^, makes the problem an appropriate example to explore 
the interplay between truthfulness and computational complexity. It has been a major open problem 
whether or not a deterministic monotone PTAS exists for QUCmax^- In this work, we give a 
definitive positive answer to that central question and conclude the problem. 

1.1 Related Work 

Auletta et al. [5] gave the first deterministic polynomial-time monotone algorithm for any fixed 
number of machines, with approximation ratio 4. This result was improved to an FPTAS by 
Andelman et al. [2]. For an arbitrary number of machines, Andelman, Azar, and Sorani [1] gave a 
5-approximation deterministic truthful mechanism, and Kovacs improved the approximation ratio 
to 3 [11] and to 2.8 [12], which was the previous record for the problem. 

^With the scheduling on unrelated machines, we are more in the dark (see [6] for a recent overview of results). 
There are impossibility results that show that there does not exist any truthful mechanism with approximation ratio 
better than a constant even in exponential time. Therefore, more primitive questions need to be answered before we 
settle the complexity of the problem. The only known algorithm for the problem is the VCG that has approximation 
ratio equal to the number of machines. 

^We say that a mechanism runs in polynomial time when both the allocation algorithm and the payment algorithm 
run in polynomial time. 
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Randomization has been successfully applied. There are two major concepts of randomization 
of truthful mechanisms, universal truthfulness, and truthfulness-in- expectation. The first notion is 
strongest, and consists of randomized mechanisms that are probability distributions over determin- 
istic truthful mechanisms. In the latter notion, by telling the truth a player maximizes his expected 
utility. Only the second notion of randomized truthfulness has been applied to the problem. Archer 
and Tardos [4] gave a truthful-in-expectation mechanism with approximation ratio 3, that was later 
improved to 2 [3] . Recently, Dhangwatnotai et al. [7] , settled the status for the randomized version 
of the problem by giving a randomized PTAS that is truthful-in-expectation. Both mechanisms 
apply (among other methods) a randomized rounding procedure. Interestingly, randomization is 
useful only to guarantee truthfulness and has no implication on the approximation ratio. Indeed, 
both algorithms can be easily derandomized to provide deterministic mechanisms that preserve the 
approximation ratio, but violate the monotonicity condition. 

1.2 Our results and techniques 

We provide a deterministic monotone PTAS for QUCmax- The corresponding payment scheme [4] is 
polynomially computable^, and with these payments our algorithm induces a (1 -|- 3e)-approximate 
deterministic truthful mechanism, settling the status of the problem. 

We start by fixing a common basis for our subsequent considerations. We always assume that 
input speeds are indexed so that si < S2 < ■ ■ ■ < Sm holds. For any set of jobs P = {pi,P2, ■ ■ ■ ,Pj}, 
the weight or workload of the set is |P| = ^l.=iPr- We will view an allocation of the jobs to the 
machines as an ( ordered) partition {Pi ,P2,..., Pm) of the jobs into m sets. We search for an output 
where the workloads \Pi\ are in non-decreasing order. 

The PTAS [8] - which is a simplified and polished version of the very first PTAS [9] - defines a 
directed network on m + 1 layers depending on the input job set, where each arc leading between 
the layers i — 1 and i represents a possible realization of the set Pj, and directed paths leading over 
the m layers correspond to the possible job partitions. An optimal solution is then found using a 
shortest path computation in this network. 

The difficulty in applying any known PTAS to construct a deterministic monotone algorithm for 
QllC'max is twofold. First, in all of the known PTAS's, sets of input jobs of approximately the same 
size form groups, s.t. in the optimization process a common (rounded or smoothed) size is assumed 
for all members of the same group. Second, jobs that are tiny compared to the total workload of a 
machine do not turn up individually in the calculations, but just as part of an arbitrarily divisible 
(e.g., in form of small blocks) total volume. 

Note that it must be relatively easy to find an allocation procedure that is in a way 'approx- 
imately monotone'. However, (exact) monotonicity intuitively requires exact determination and 
knowledge of the allocated workloads. To illustrate this, we just point out that in every monotone 
(in expectation) algorithm for (5||(7max provided so far, the (expected) workloads either occur in 
increasing order wrt. increasing machine speeds, or constitute a lexicographically minimal optimal 
solution wrt. a fixed solution set and a fixed machine indexing. 

Thus, both of the mentioned simplifications of the input set - which, to some extent, seem 
necessary to admit polynomial time optimization - appear to be condemned to destroy any attempt 
to make a deterministic adaptation monotone. (The authors of [7] used randomization at both 
points to obtain the monotone in expectation PTAS.) Our ideas to eliminate the above two sources 
of inaccuracy of the output are the following, respectively: 

1. As for rounding the job sizes, note that grouping is necessary only to reduce the (exponential) 

^This is intuitively clear, since our work curve is a step function with a polynomial number of steps. 
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number of different outputs. We can achieve the same goal if for any group of jobs of similar size 
we fix the order of jobs in which they appear in the allocation (e.g., in increasing order), and, 
calculate with the exact job sizes along the optimization process. Notice that not even the fact is 
obvious that such a solution with increasing workloads exists. Now, if reducing a machine speed 
increases the makespan of the (previously optimal) solution, that means that this machine became 
a bottleneck, so a new upper bound on the optimum makespan over the considered set of outputs 
is induced exactly by the (previous) workload of the changed machine (the same argument as used 
in [4, 2, 7]). With this idea we derandomize the first type of randomization {job smoothing) of [7]. 

2. Concerning tiny jobs, we observe that with these we can fill up some of the fastest machines 
nearly to the makespan level. On the other hand, it is easy to show [3] that pre-rounding machine 
speeds to powers of some predefined (1 + e) does not spoil monotonicity and increases the approxi- 
mation bound by only a factor of (1+e). Assuming now that the coarsity of tiny blocks is much finer 
than the coarsity of machine speeds, we can be sure that (full) machines of higher speed receive 
more work than slower machines. Moreover, having reduced the speed of such a machine, tiny jobs 
in its workload 'flow' to other machines to provide a makespan 'much' smaller than implied by the 
previous workload of this machine. 

It is quite a technical challenge to combine these two ideas so smoothly that in the end yields 
a correct monotonicity proof. We accomplish this task as follows. We fix (for the proof argument) 
a set Li of non-tiny jobs on each machine, so that the Li, L2, ■ ■ ■ , have increasing and exactly 
known weights, and they fulfil the constraints suggested in 1. On top of the sets Lj, each machine 
has a set Si of small jobs (due to necessary conditions for rounding the total volume of tiny jobs, 
some of these are uniform blocks, while some are known exactly). The total set of small jobs is 
flexible (along the proof), in particular we can always move a small job to a higher index machine, 
and obtain a valid schedule. Moreover, we set the objectives so that in an optimum solution the 
small jobs are moved to the higher index machines as much as possible (and so, make them full). 

Our monotonicity proof becomes subtle in case of the first (and so, not necessarily full) machine 
containing small jobs. It is especially so when this first machine is m, not leaving space for ma- 
nipulating the small jobs in the output as needed. In order to circumvent this problem we restrict 
the search to allocations where at least two machines do have some tiny blocks (unless too few 
tiny jobs exist). Moreover, it seems crucial in our monotonicity argument that every machine has 
the possibility to get rid of all the tiny blocks (i.e., those inducing uncertain workload) if this is 
provoked by a reduction of its speed. Combining these two requirements we treat the last three 
machines as a single entity. A carefully optimized assignment of an 'obligatory' set of tiny blocks, 
and later of the actual tiny jobs to these machines then implies monotonicity. 

1.3 Preliminaries 

The input is given by a set Pj of n input jobs, and a vector s (or a) of input speeds si < . . . < Sm- 
For a job p G Pj we use p both to denote the individual job, and the size of this job in a given 
formula. For a desired approximation bound 1 -|- e, we choose a (5 <^ e, that will be the rounding 
precision of the job sizes. For ease of exposition, we will assume that (1 -|- 5)* = 2 for some t G N.^ 
Furthermore, we define p as the unique integer power of 2 in [6/6,6/3]. We use the interval notation 
for a set of non-negative integers like, e.g., [l,m]. This should cause no confusion, as in such cases 
it will always be obvious that we consider integers. 

''This assumption is unrealistic for computations, but it is not necessary for the result to hold. We could equally 
well use the rounding function of [8] or [7]. However, this would overload the paper with clumsy technicalities, e.g., 
in Definition 2. Also, since our result is of purely theoretical interest, we do not try to optimize the ratio S/e; it will 
be clear that, e.g., 306 < e suffices in the proofs. 
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Definition 1 (job classes). If p denotes (the size of) a job, thenp denotes this job rounded up to 
the nearest integral power of (1 + 6). A job p is in the job class Ci, iffp= (1 + 5)'- 

Let Ci = {pii,pi2, ■ ■ ■ iPinf''''^} be the jobs of Ci in some fixed non- decreasing order of size. 
We use the notation Ci{a) = {pii, ■ ■ ■ ,Pia} for < a < nf^^, and Ci{a,b) = Ci{b)\Ci{a) for 
0<a<b< nf^. 

If P = {pi,P25 • ■ ■ ,Pj} is a job set, then P = {Pi,P2) ■ • • ^Pj} denotes the corresponding set of 
rounded jobs. The weight or workload of P is \P\ = Yj'r=iPr'i the rounded weight is \P\ = Yli=iPr- 
Assuming that the jobs are in non-increasing order of size, we denote the subset of the r largest 
jobs by P** = {pi,P2,P3, ■ ■ ■ ,Pt]- 

2 Canonical allocations 

This section characterizes the type of allocations - we call them canonical allocations - that we 
will consider. Definitions 2 and 3 describe the necessary restrictions on the (output) job partition 
Pi, . . . ,Pm- Subsequently, as our first main result, Theorem 1 states that for any input, and any 
(5 > 0, a canonical allocation exists that provides a 1 + 3(5 approximation to the optimum makespan. 

Definition 2 ((5-division). We say that a given set of jobs P is (5-divided into the pair of sets (L, S) 
{ovP = {L,S)) if 

(Dl) P = LUS and LnS = 0, 

(D2) p > ^^^^js for every p & L, and 

(D3) q < S\L\ for every q e S. 

Definition 3 (canonical allocation). For a given input, an allocation Pi, P2, . . . Pm is called canon- 
ical, if for every i G [l,m], the set Pi can be 5-divided into {L{Pi),S{Pi)) (or {Li,Si), for short), 
so that the following properties hold: 

(Al) Ifi < i', then \Li\ < \Li'\. 

(A2) for jobs p and q of the same job class p < q holds if and only if 

(a) p € Li and q G Si' for some i,i' G or 

(b) p £ Li and q G Lj/ and i < i\ or 

(c) p & Si and q G S^' and i < i'. 

In the proof of Theorem 1, we modify an optimal partition of the rounded input jobs Pj to get 
the canonical allocation: First we take the core set of each set in the partition (sec Definition 4), 
then we order the sets by increasing order of core size, and apply Lemma 1 to make the modified 
cores fulfil property (A2) (b). It is easy to show that small jobs (those outside the cores) can be 
shifted to fast machines, where they remain small, and so still induce a ^-division on each machine. 
First, we start with the definition of the core, and then we proceed with Lemmata 1 and 2, that 
are important ingredients of the proof of Theorem 1. 

Definition 4. Given a set P of jobs, we define the core cr{P) of P as follows. Consider the jobs 
P = {pi,P2, ■ ■ ■} in a fixed non-increasing order of size. Let j be minimum with the property that 
Pj ^ i^l-P"'"^!? then cr{P) =^ P^-^ = {^^^ . . . ,pj_iy. If no such j exists, then cr{P) =^ P. 
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Lemma 1. Let (Qi, • • • , Qm) be a partition of a subset Q of the input jobs such that \Qi\ < ■ ■ ■ < 
\Qyn\- There exists a partition {Li, . . . , L^) of Q that satisfies: 

1. |Li| < ... < \Lm\ 

2. for any job class Ci, if job pij belongs to Li and job pik to L^/ where i < i', then j < k. 

3. for all i 

^ m<\Lr\<\QiV 



1 + 5 



Proof. Let Qi C R. We say that we maximize Qi wrt. R, if for every class I we replace the jobs in 
Qi n Ci by the largest possible jobs in i? H C/ (i.e., if there are r such jobs then with the r largest 
jobs of i?n Ci). We will denote the maximized set by Qf . Clearly, ii S C R, then \Qf\ < IQf |. 

Now we construct the new partition recursively. We define L^ as a set of maximum work- 
load among {Qf . . . ,Qm} (notice that the latter is not a partition of a subset of the input 
jobs). Assuming that L„i = Qf, now Lm-i is defined to be a set of maximum workload among 

{Q^^'^™, • • • , Qf}'i"' , Qf^i"" ■ ■ ■ , Qni'^"'}, etc. In every recursive step we selected a set that has 
larger weight than any other remaining set, even if those sets get the largest remaining jobs of the 
respective classes. This proves 1., whereas 2. holds by construction. 

Next we argue that 3. holds as well. Observe that {Qi,Q2, ■ ■ ■ , Qm} {-^Ij -^2, • • • , -^m} 
sets) are exactly the same. The proof of 

^ \Qi\<\Li\<\Qi\ 



l + S 



is simply the fact that there exist at least i jobsets among the Qi, so that \Qi\ < (l+5)|Lj|, (namely, 
the sets of rounded jobs Li, L2, . . . , Lj), on the other hand there exist at least m — i + 1 sets among 
the Qi, so that |Lj| < \Qi\ (namely, the sets Lj, Lj+i, . . . , Lm)- □ 



Lemma 2. Let P be a set of jobs, then 

s 



(cl)ypecr{P) p>^|cr(P)|; 



(c2) 'iqe P\cr{P) q < j^\cr{P)\. 

Proof. (c2) is trivially true, since job sizes are non-increasing. By Definition 4 it holds that Pj^i > 

5 I D7-2 



P^-^\. Therefore 



(1+5) I 

(1 + Sfpj.i > (1 + S)pj_^ + Spj.i > 5(|P^-2| +p,_i) = 6\P^-^\ = 5\criP)\. 
The same holds for all jobs not smaller than pj-i, which proves (cl). □ 

Theorem 1. For arbitrary increasing input speeds and input jobs, a canonical allocation inducing 
a schedule with makespan at most (1 -|- Z5)OPT exists, where OPT is the optimum makespan of 
the input. 

Proof. Let P be the set of all jobs and si < . . . < be the input speeds. We process this set 
of jobs in five steps to finally obtain the desired canonical allocation. In the next two steps we 
consider only the set of rounded jobs P. 
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1. (core division) Wc start from an optimal schedule of P. Let this be (Pi, • • • > -Pm) and its 
makespan be M < (1 + 6)OPT. The inequality trivially holds, since any schedule of P induces a 
schedule of P of makespan increased by a factor of at most (1 + 6). 

Moreover, for every Pj let Si = Pi \ cr{Pi). In the rest of the proof we call jobs in |J™ ^ cr(Pi) 
large, and jobs in UI^i '^i small. 

2. (core sorting) In this step we start from the schedule (Pi, P2, . . . , Pm) with Pj = cr{Pi)(jSi and 
we result in a (fractional) schedule P[, . . . ,P4 with P/ = L^US"^, where L't^^L^, ■ ■ ■ ,L'^ is simply 
the set of cores cr{Pi) sorted by weight. Each small job might be cut into finitely many parts, and 
distributed over the sets Sj'. Importantly, P' has makespan at most M. 

We define the rearranged sets S'- of small jobs in course of sorting {Pi} step by step, with 
insertion: after step i, cr(Pi), cr(P2), . . . , cr{Pi) become sorted by weight, and the jobs of (J/i=i '^h 
are allocated fractionally to machines 1,2, ...i, so that the makespan remains M, and the sets 
Pj+i, . . . , Pm remain intact. 

Now we explain how we redistribute the small jobs. When we insert the set Pj = cr{Pi)(jSi to 
some position k < i, then the job sets previously on machines k,k + I, . . . ,i — 1 move to the next 
higher index machine, where they clearly fit below M. Even though all the jobs in Pj might not fit 
on machine k (below M), certainly the jobs of cr(Pi) do. This is because the workload that was 
previously on machine k, had a coreset larger than cr(Pj). Moreover, notice that all jobs previously 
(before step i) on machines k,k + 1, . . . ,i altogether fit on the same set of machines below M. 

Now we don't move large jobs at all, but take the small jobs in the same order as they are 
allocated now, starting from (small) jobs on machine i,...,k, and continuously 'fill' them to the 
machines in the same decreasing order of the machines, cutting a (fractional) job into two when 
the time M is reached. (Alternatively, we can just pick the superfluous jobs of Si, and fill them 
(fractionally) to empty gaps of machines k + 1, . . . ,i. ) 

Observation. Every (fractional) small job that was previously in Si, now moves to a P^, with 
\L'f^\ > \cr{Pi)\. This implies that (c2) of Lemma 2 still holds (a fractional job fulfils (c2), if its 
original full size does). Furthermore, (cl) trivially holds, since we did not change the large sets. 

3. (permutation) Now wc return to the original jobsizcs. We replace the rounded jobs in each 
by original jobs, so that we use the smallest possible jobs within in every class. We want that (Al) 
and (A2) hold, so we apply Lemma 1 on the resulting partition. After applying the procedure of 
Lemma 1, we obtain Li, . . . , L^. By the lemma we know that j^l-^^^ | < \Li\ < \L'-\ holds for every 
i. This implies, on the one hand, that the makespan is still at most M. On the other hand, (leaving 
the Si as they were) we obtained (5-divisions {Li, S'^) : (D2) holds, since for any job p in Li we have 
by (cl) that 

(D3) holds, because if g G then by (c2) 

4. (small job sorting) Finally, we fill the small jobs continuously on the machines below M, in 
decreasing order of size (still fractional allocation), starting from machine m. This ensures (A2)(c), 
and we claim that it does not spoil (D3) either: if, due to this sorting, now some job q were too 
large for the set Li of the machine, then that would mean that all the small jobs of size at least q 
must have been on machines i + 1, . . . ,m, (while the large sets were the same), and fit below M, 
which is impossible as shown by the ordered allocation. 
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5. (integral allocation) Wc make an integral allocation by assigning every fractional job to the 
fastest machine where the job occurs. Note that by the previous construction, every machine gets 
at most one such job. This increases the makespan to at most (1 + 5)M < (1 + 6)^OPT, because 
we had (and still have) (5-divisions. 

□ 

3 Configurations 

Like in [8, 7], we introduce so called configurations a{w, ^, n°, n^) in order to represent any possible 
job set Pi of the partition, up to 5 accuracy. We use the configurations to define the vertices of a 
directed graph Ti. A well-defined optimal path in this graph will then specify our output schedule.^ 
The first component of any configuration is a magnitude w which is an integer power of 2. 
As we proceed from slow machines to fast machines in a schedule, the monotonically increasing 
magnitude keeps track of the largest job size allocated so far, which must be some size in the interval 
{w/2,w]. Thus, the current magnitude also shows, which (larger) job sizes are not yet relevant, 
and which (tiny) jobs need not be taken into account individually anymore in the configuration. 
This motivates the next definition. 

Definition 5 (valid magnitude). The value w = 2^ {z G is a valid magnitude if an input job 
p E Pj exists so that w/2 < p < w. Let Waiin and tt^max denote the smallest and the largest valid 
magnitudes, respectively. We call a job tiny for w if it has size at most pw. 

Recall that p is the integer power of 2 between (5/6, and 6/3. Having a magnitude w fixed, 
let A = log(-i_,_5) pw = t ■ \og{pw), and A = log(i+5) w = t ■ logw, where (1 + 5)* = 2. Notice that 
both A and A are integers, and by Definition 1, the jobs of size in (pw,w] belong to the classes 
Ca_i_i, . . . , Ca. These will constitute the relevant job classes, if the largest jobsize on the current or 
slower machines is between w/2 and w. 

If the configuration a represents the set Pi in a job partition, then the so-called size vector 

n° = (n^,n^^-^, . . . ,n^) describes the jobs in the cumulative job set Ai^i '= |J^T^\ Ph as follows. 
For A < ^ < A, (l ^ fi, n + l), exactly the first (smallest) nf jobs of the class Ci are in the set 
Moreover, in the total weight of jobs from \Ji<^x^l ™ interval ({n'^ — l)-pw, (n1 + l) ■ pw). 
However, the particular subset of these small jobs inside ^i-i, is not determined by a. The vector 
represents the set Ai = U/i=i ^h: in the same way. 

A major difference to the configurations of [8], is that our configurations should not only repre- 
sent a job set Pi, but also its 5-division (Lj, Si). In particular, we will distinguish four types of job 
sizes in a configuration. Tiny jobs have size at most pw, and, as already seen, are represented by 
the first coordinates n\ of the two size vectors with their total size rounded to an integer multiple 
of pw. Correspondingly, we will sometimes talk about blocks of size pw which are simply re-tailored 
tiny jobs for the purposes of our analysis. 

Definition 6. Blocks are imaginary tiny jobs, each having size pw for some valid magnitude w. 
We use S{n\,pw) to denote a set of nx blocks of size pw. 

Small jobs are those that (together with the tiny jobs), can only appear in the set Si of the 
(5-division, whereas large jobs can only be in the set Lj. However, there must exist job classes - 

^Roughly speaking, our graph can be thought of as the line graph of the graph G defined in [8] (with simple 
modifications). That is, the vertices of H correspond to edges of G. This is the reason why our configurations include 
two vectors ni and m instead of only one. 
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we will call them middle size jobs ~, which might occur in both Li and Si, since by (D2) and 
(D3) of Definition 2 there is a fiexiblc border between the job sizes in Lj and in Si. Therefore, 
exactly two job classes, fi and fi + 1 will be represented by a triple of (increasing) non-negative 
integers, = {n^e,n^rn,n^s), and n(^+i) = (n(^+i)<;, n(^+i)m, n(^+i)s), instead of scalar values n^, 
and in both of the vectors fi°,n^. In the case of n^, the meaning of the three numbers will 

be that in the set Ai^i, from the job class exactly the jobs in C^{n'^j^,n'^g) are allocated as 
small jobs, that is, to one of the sets 5i, ^2, . . . , Si-i, and exactly the jobs in C^{n°^f) as large jobs, 
i.e., in one of Li, . . . , and similarly in case of for the set Ai. The sets C^(n°5, n°„max) and 
C^(n°^,n°^) are to be allocated as small and large jobs, respectively, on higher index machines. 
The meaning of the numbers for /x + 1 are analogous. Now we finished the preparations for the 
next two definitions. 

Definition 7 (size vector). A size vector n = {n\, . . . , n\) with middle size /x G [A+1, A], is a vector 

of integers, with the exception of the entries = {n^g,n^m,n'^s) CLnd n^^i = 

(ra(^+i)£, n(^_i_i)„, n(^_i_i)5) both of which are themselves vectors of three integers, so that n^i < 
n^jim < n^s-, o,nd ra(^_|_i)£ < < ^(^+i)s holds. All of the integer entries belong to [0, n]. 

Definition 8 (configuration). A configuration a{vu, iJ,,n° ,n^) consists of four components: a valid 
magnitude w, and two size vectors n° = (n^, . . . ,rz^), and n} = {n\, . . . ,n\) with middle size /x, 
such that 

(CI) n° < n] < for X < I < A, l^{fi,fi+ 1}; 

(C2) if w ^ u^min then nj > for at least one Z G {A — t,A]; 

(C3) nl<nl< 

(C4) < n]^ < = nj,^ < < n]^^ < n^"^, and analogously for /x + 1; 

Ta = S{n\-nl,pw). 

A 

La = Cf,{n1^,nl^)UC^^+i){n1^^^y,nl^^^y)lJ (j C/«,n,^), and 

l=IJ.+2 

= C^(n°3,<Juq^+i)(n^^+i)„n;^+i),)U (j Q(nf,ni)ur„. 

l=\+i 

(C5) either n" / n\ and (1 + (5)(^+i) < 6- |L„| < (1 + (^)(^+2); 

or a is the empty configuration {wmm, Amin + 1, 0, 0) where Amin = t ■ log(pu)niin)- 

Notation. We refer to the whole represented job set U Sa (including virtual blocks) simply by 
a (abusing notation), and \a\ stands for the total work of the set a. We denote the set without tiny 
blocks by a = a\Ta. 

It is easy to verify, that the requirements (CI), (C3) and (C4) are necessary, if we want n" and 
to represent cumulative job-sets of a partition the way we described above. (C2) implies that w 
is always the smallest possible magnitude for representing these job-sets. (C5) is different in flavor 
from the previous four properties: it implicates that the set La and n strongly affect each-other. 

9 



pw 



+ 3; 



However, it can be shown (as we do in proving Theorem 2) that for every set Pi = Ai \ (and 
corresponding w) in a canonical schedule a unique /i > A exists that fulfils (C5). 

We stress here that the cumulative sets Ai do not possess a (5-division, and a single size vector n 
does not represent an (L, S) division at all. On the other hand, any configuration, indeed, represents 
a (5-division (see Lemma 3). 



4 The directed graph Tij 

In this section, for arbitrary input instance /, we define a directed, layered graph Hi- All vertices of 
this graph are configurations, selected, numbered, and 'chained' to form the graph in an appropriate 
way. 

First, in Section 4.1, for an arbitrary configuration a, we define a set Scale{a) of configurations. 
These are the possible configurations of an end-vertex of any arc with a starting vertex having a 
as configuration. We took the name scMe from [8], where scale^^^'(n) = n' is a single size vector 
that represents the same set of jobs as n does, from the aspect of some higher magnitude w' than 
w. Similarly, in our case, if a = (u), ji, n" , v}), j3 = {w', jj,' , n'° , n'^), and /3 G Scale{a), then n'" must 
represent the same job set Ai, as n^, from the point of view of a (possibly) increased magnitude w' 
and a (possibly) increased middle size ^' . Next, in Section 4.2, we proceed with the exact defition 
of the graph, and finally in Section 4.3, we prove that a minimum shortest path in this graph, 
corresponds to an allocation with approximation ratio 1 + 0{5). 

4.1 The definition of Scale{a) 

The exact definition of Scale{) might look somewhat technical. Nevertheless, this is mainly due to 
the middle sizes /j, and n'. Disregarding (S2), the conditions below are the natural 'scaling require- 
ments', as also appeared in [8]. In the definition we will use the notation A = log^]^^^) (pw) , A = 
log(i+5)«^> ^' = log(i+5)(pw;')> and A' = log(i_^5) u;'. 

Definition 9 {Scale{a)). Let a = {w, ^,n° ,n^), and j3 = {w' , fi' ,n'" ,n'^) be two configurations, 
where v} = n = {n\, . . . , ha), resp. n'° = n' = {n'y, . . . , n'j^,); then (5 G Scale{a) iff 

(51) w < w', and fi < fi'; 

(52) if li' = n then = and n'^.^^ = n^_,_i; if n < n', then = n^rn and = ; 
if H + l = H' then n(^+i) = n^^; if + 1 < then n(^+i)^ = n(^f,+i)m and n^,^ = n^,^ ; 

For the sake of a concise presentation, in the next three requirements we assume that — f^^si 
and n'^^ij^^^ =^ whenever /x < /i' holds, furthermore '= '^(/i+i)s) and n'^, =^ '^JJi/^j if 

additionally /x -|- 1 < /x' holds. 

(53) ifA<l<A', then n'l = 0; 

(54) if X' <l <A, then ni = n\ 

' i:tx+i\Ci{ni)\ 



(S5) // n\ = 0, then let n'^ = 



pw 

n'y be the smallest nonnegative integer such that 



. Otherwise let Ta = n\pw + Yl^=x+i l^'K^Oli and 



{Ta - pw, Ta + pw) C {{n'y - 1) • pw' , {u'y + 1) • pw'). 
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We provide some intuition concerning Definition 9, by comparing n and n', the old and the new 
size vectors, respectively. First of all note that if a and /? represent the consecutive sets Pi and 
Pi_l_i of a partition, then, indeed, both of these vectors should represent the same cumulative job 
set Ai = ELi Ph- 

Besides the natural demand of increasing w and /u, which we keep in its simplest form (SI), the 
'traditional' scaling requirements are (S3) to (S5). By (S3) and (S4), job classes not appearing in 
n do not occur in Ai, whereas those appearing explicitly in both n and n' must be represented by 
the same number in both size vectors. 

Less obvious is (S5). By the first condition wc want to achieve that n'y = if and only if 
no jobs of size at most pw' have been allocated in Ai. Now - by inductive argument - the total 
size of tiny jobs in Ai must be between {n\ — l)pw and {nx + l)p'w. During scaling to w' we shift 
this interval by the exact workload of jobs that become tiny right now, and so obtain the interval 
(Ta — pw,Ta + pw). Now n'y ■ pw' has to be the midpoint of a new, (longer) interval containing 
{Ta — pw, Tq, + pw) as a subset. For w' = w we clearly obtain t„ = n\pw, and so n\ = n'y. Assume 
now that wp = 1, and w' p = 2. Observe that any interval of length 2 (i.e., {Ta — pw, Ta + pw)), either 
contains an integer multiple of 2 or has it as a (lower) endpoint. This will be n'y - 2 = n'y ■ pw', the 
middle of the larger interval (here of length 4) that covers the original interval completely. Since 
the new interval can also be covered by a properly positioned interval of length 8, and so on, this 
proves that also for pw' = 4, 8, 16. . . , etc., an appropriate n'y exists. Here we exploited that the 
magnitudes, and p are exact powers of 2. 

Finally, we turn to the meaning of (S2). As long as p, remains a middle size in the new size 
vector n', the same triple n^ represents the set of jobs allocated in Ai as small rcsp. as large jobs, 
from the class C/^. If p becomes smaller than the new middle size p', that means that the jobs of 
Cfj,{n^m), that have to be allocated as large jobs, have already been allocated, that is, n^e = n^rn 
and so Cij,{nij,m) = C^{nn() C A^. Moreover, C^{nns) are now all the jobs allocated from this class, 

so we can define (the scalar) n^ = n^g. Similarly, if p' is not a middle size in the old vector n, 
then no jobs of class C^i have been allocated as small up to the set Ai, and this is expressed by 

n^;^ = n'^ig , and by the notation n'^, =^ i^'f^'t The considerations for + 1 and for p' + 1 are 
analogous. 

4.2 The graph Hi 

The vertices of Hi (i.e., the configurations) are arranged in m layers, and in levels I and //, which 
are orthogonal to the layers. The configurations on level I must have an empty set of small jobs, 
i.e., Sa = 0, and here the layers {m — 2, rn — 1, m} are empty. Level II has m — 1 'real' layers, and 
we add a single dummy vertex Vm adjacent to every vertex on layer m — 1, that alone forms the 
last layer m.^ In general, the ith layer stands for the zth set Pj. Any directed path of m nodes leads 
to Vm over the m layers, and from level / (or //) to level //. Such a path we will call an m-path. 
The m-paths will represent partitions of the input Pj. 

For a given m-path, the very first vertex on level II is in some layer k < m — 2; we will call 
it the switch vertex, and k the switch machine or switch index. (Note that k is thus the first 
machine possibly receiving small jobs.) We shall denote the vertices on the two levels by Vi, and 
Vii, respectively. 

Notation. For any directed path {yi,V2, ■ ■ - Vr), the corresponding configurations of the nodes will 
he denoted by (0:1,02, . . . , ctr)- 

®More precisely, we will unite layers m — 2 and m — 1, and use double vertices in the united layer, but it simplifies 
the discussion to think of these as pairs of individual vertices. 
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At this point, let us briefly discuss about the last set of the partition. Note that for an 
m-path, the last configuration cim-i alone represents UftL/ Ph- Thus, we can use ttm-i to uniquely 
determine the 'hidden configuration' am (not appearing explicitely in the path). We define 
as follows: let Wm = Wm-i, IJ-m = A*m-i) E^nd n° be the second size vector in am-i extended by 

entries on (A^_i, Amax]- Furthermore, n\ = '^^ ' + 3; n\g = n™^ and = n]^ for 

1 G + 1}, and nj = n™^ if I ^ {A, /it, + 1}. Observe, that am represents all jobs of class 
higher than A„,_i (the algorithm can handle this job set as a huge chunk without violating the 
running time bounds).'' Keeping Wm-i = Wm, plays an important role in the monotonicity proof. 

Furthermore, also due to the monotonicity requirement, we want to handle even the last three 
workloads am-2,ctm-i, and am together. In particular, we will require that either all of them have 
the same magnitude, and therefore use w'^_^ = Wm-2 instead of ti'm-ii or that Wm-2 is much 
smaller than Wm-i, so that all jobs on m — 2 (if exist) , are tiny for machines m — 1 and m (cf. cases 
(A) and (B) below). 

We define the graph so that every m-path should represent a canonical allocation, as defined 
in Section 2. Beyond that, we impose further restrictions on the paths that we consider; these 
restrictions can also be reflected in the graph definition. Moreover, for a given speed vector, 
any m-path will have a naturally defined makespan value. Among the m-paths adhering to the 
restrictions, the algorithm selects an m-path having minimum makespan, as the primary objective. 
Among paths of minimum makespan, we maximize the index of the switch machine k. A further 
order of preference, and restriction to be of type (A) or (B) is the following. Observe that in case 
(A) on the last three machines, resp. in case (B) on the last two machines the block-size for tiny 
jobs is the same. 

(A) Wm-2 > ■ Wm-i', in this case we modify the last magnitude to be := Wm-2, and 
require \am-2\ < \am-i\ < \^m\, and \am-2\ < \(Xm-i\ < \ctm\; moreover, 

(i) either all tiny jobs (measured by blocks) are on machines m — 1 and m, or 

(ii) machines {m — 2, m — 1, m} have at least 18 tiny blocks, and at least two of them have 
each at least 6 tiny blocks. 

(B) Wm,-2 < ■ Wm-i; then lam-il < \am\, and \am-i\ < \(Xm\, and 

(i) all machines but {m — l,m} are empty, or 

(ii) m — 1 and m together have at least 6 tiny blocks. 

The requirements (A) and (B) can be incorporated in the graph, e.g., by using (polynomially 
many) special double vertices v'^_2 with double configurations (0^-2, «m-i) on level II. Applying 
:= Wm-2 > P^ -Wm-i Can be done by using size vectors of triple length for the double vertices 
of type (A). Clearly, all restrictions can be represented by the configurations {am-2-,OLm-i)- 

The subsequent definition of graph Tii is independent of the speed vector s, and depends only 
on the job set Pi. After that, we assign a weight to each vertex, called finish time., and define the 
makespan of a path accordingly. Obviously, these values do depend on the machine speeds s. 

We assume, w.l.o.g. that m > 3, otherwise we include a machine of speed 0. 

Definition 10 (graph H/). Hi{V,E) is a directed graph, where every vertex v ^ Vm is a triple 
V = {d,i,a), so that d G {/, II}, i is an integer in [1, m— 1], and a = {w, fx, n°, n}) is a configuration. 
In particular, each triple that obeys the rules (VI) to (V4) below, determines a vertex in V. 



''Because of /Um-i = Mm, we can only require (1 + 5)'^+^-' < 5 ■ jLa; | instead of property (C5) of configurations. 
As a consequence, on the last machine the division (Lm, Sm) does not fulfil (D2). 
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(VI) if i = 1, then for Z 7^ /x, /i + 1 n" = 0, while for I G {/x, n + 1} n^^ = and = n"^; 
(V2) ifd = I, then i e [1, m - 3], and Sa = 0; 

(V3) if d = II, then i G [1, m — 3], and n\ < max{ra^ 

There is an arc from v = (d, i, q) to = {d! , i + 1, /?) i/ an onZy if 
(El) /3 G Scale{a), and 
(E2) |Lq,| < |L^|, and 
(E3) d<d'; 

(V4) Finally, for d = II, and combined layers {m—2, m—1), include double vertices v' with double 
configurations (am,-2,«m-i) so that for (q;^_2, Om-i) Q^m) the requirements (El) (E2) and 
either (A) or (B) hold. 

Definition 11 (finish time of a vertex). Let v = {d, i, a) he a vertex of Hi, where a = {w, fi, n°, n^). 
The finish time of v is then f{v) = ^"^^^^ if n'^ < n\, and f{v) = otherwise. 

Definition 12 (makespan of a path). Let Q = {vi,V2, . . . ,Vr) be a directed path in Hi- If Q C Vj, 

or Q C Vn, then the makespan of Q is M(Q) = max^^^^ /(f^j). If Q is an m-path with switch 
vertex Vk = {II,k,ak), then M{Q) = max{J^, max/j^^ /(■u/j)}. 

Definition 12 allows Vr = Vm- The finish time /(vm) is calculated from the hidden configuration 
uniquely determined by Vm-i- 

4.3 Approximation ratio of minimum-cost path 

Theorem 2, saying that an m-path having (path-) makespan close to the optimum makespan of the 
scheduling problem always exists, is a consequence of Theorem 1. The proof is rather straight- 
forward, and requires a technical translation of real schedules to m-paths of TCj, which involves 
creating blocks of size pWi from the actual tiny jobs. In order to prove Theorem 2, we will make 
use of the following two technical lemmata. 

Lemma 3. For any configuration a, the sets {La, Sa) form a S- division of the set a. 

Proof. The smallest job that might occur in L^ is at least from the class C^, therefore p > {1 +S)^ 
for any p e La- This imphes ^ • (1 -|- 5)^ > (1 -|- 5)^*+^ > 5 ■ \La\, by the property (C5), and thus, we 
obtained (D2) for {La,Sa). 

Similarly, for any q G Sa, we have q < {1 + 5y~^^. This proves (D3), since (1 -|- Sy~^^ < 5 ■ \La\, 
by (C5). Obviously, La Ci Sa = 0, and a = U Sa, so (Dl) holds, and {La,Sa) is, indeed, a 
(5-division. □ 

Lemma 3 implies t hat tiny blocks in any aj are small wrt. |i^aj- The following observation sets 
a more exact bound on the block size. 

Observation. For any (non-empty) configuration ojj = (w, fj,,n°,n^) in an m-path, 

3 

-pw<6\La,\. (1) 



pw 



1}; 
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The observation holds, since wj^ < for some h < i, according to (C2), (El) and the 

requirements of Scale{). By (E2), |Lq,^| < \La^\. Now we have p < 5/3, and w/2 < 

In the following proof(s), we will frequently say we 'put' small jobs from one machine to another, 
although we are actually modifying some m-path Q to obtain another path Q'. Technically, this 
can be done as follows. Suppose that we put one job from class / > A from machine i to i + 1, 
where a{'w, fi,n° ,n), and P{w' , fi' ,n' ,n^) with smallest job-classes A and A' are the configurations 
of Vi and I'j+i, respectively. In a we reduce by 1. As for /?, if i > A', then we reduce ra^ by 1, and 
we are done. If I < A', then we scale the reduced n size vector according to (S5) in order to obtain 
the new n'y. In this way, n'^ cither reduces by 1, or keeps its original value. In the latter case we 
can say that swallowed the job. If we put a job from i onto h > i, we can repeatedly apply the 
above changes to the configurations, until the job gets swallowed, or we arrive at machine h. After 
such an act (given that we obtain a canonical allocation again), we arrive at an allocation that is 
represented by a corresponding m-path in the graph. 

Theorem 2. For every input I = (-P/,s) of the scheduling problem, the optimal makespan over 
all m-paths in TLj is at most OPT ■ {1 + 0{S)), where OPT denotes the optimum makespan of the 
scheduling problem. 

Proof. Recall that tfmax is the largest valid magnitude. We collect tiny jobs from P/ into a set T 
starting from the smallest job, and proceeding in increasing order of job size. We stop collecting, 
if either \T\ > ISpuiniax) or the next job has size more than pifmax- 

Let P° := Pi \ T. 

According to Theorem 1, a canonical allocation Pf , . . . , P^, P° = {Li, Si) of the jobs in P° 
exists with makespan of at most OPT{l + 3(5). We modify this allocation Pf , . . . , P^ step by step, 
and finally obtain an appropriate path in Tii. 

First, we shift small jobs in S„i-2 U Sm-i U Sm to the right so that |Q;m-2| < |am-i| < \otm\ 
holds, increasing the makespan by at most 3SOPT. 

Now we define the magnitude Wi for each z < m to be the smallest power of 2 that is at least 
max{p I p G U/i=i^j°}- For m let Wm ■= Wm-i- Because of (Al), now |Lj| > Wi/2, and inequality 
(1) holds for Li. In turn, (D2) and (1) imply that jobs of size at most pWi can only be in Si (and 
not in Li). (Note that (D2) and (D3) admit that the largest job in IJ/j=i appears in some Si. 
However, the previous sentence implies that it cannot belong to Tj. Therefore, after the subsequent 
modification, it remains in its original set P°; that is, the defined magnitudes Wi remain consistent.) 

As the next step, we allocate the set T of tiny jobs to the fastest machine, increasing the 
workload of m by at most ISpWmaxj and the makespan by at most 126 ■ OPT, (cf. (1)). Moreover, 
if Wm-2 < p^Wm-i, then either all machines i < m—2 are empty, or m — 2 is non-empty, meaning 
that T has at least 18 pw^^ax > 18pWm-i jobs of size at most t/^m— 2? 

which are jobs tiny for Wm-i, so 

(B) holds. If Wm-2 > P^Wm-i, then cither T contains all jobs tiny for Wm-2, so that (A)(i) holds, 
or T has enough tiny jobs so that (A)(ii) holds. In the latter case we distribute T over machines 
m — 1 and m. Let us denote the current partition of P/ by Pi, P2, . . . , Pm- 

In what follows, we modify this partition so that it contains an integer number of tiny blocks 
instead of the tiny jobs for every i ^ m. 

Let Tj = {p G Pi\p < pwi} be the set of tiny jobs in Pj. The jobs in Tj make \Ti\/{pwi) 
(fractional) blocks. We can build integral blocks out of these for every i < mhy a simple procedure 
- also described in [7] - which packs (possibly) fractional tiny jobs from a fractional block on some 
machine i into a fractional block on machine h > i, until one of them gets rounded to an integer 
number of blocks. Note that the (full size of) any repacked job remains tiny on its new machine. 
We stop this process, if there is just one machine i < m left with a fractional block, and put this 
fractional block on machine m. Note that every machine i received (fractional) jobs of (full) size 
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at most pwi, and the workload increased also by at most pwi. The resulting job partition is called 

For each P/ we define a configuration in a recursive manner. Let Aj = log^i.,.^) pwi, and 
Aj = \og(^i_^_g^ Wi- If P/ = 0, then let Oj be the empty configuration (sec (C5)). Otherwise we 
calculate the unique p,i s.t. (C5) holds for Lj. (1) implies pwi < (1 + S)'^\ that is, Aj < fii. On the 
other hand, since Li contains at least one job p s.t. <P<Wi = {l + S)^\ we have /i < Aj. 

For ui wc define fifyi '. — Pui—i- 

The jobs in are either blocks of size pwi, or have size in (pwi,Wi\. Let n° of be the null 
vector (cf. (VI)) if z = 1, and be the second size vector in a^-i scaled to Wi if i> 1. The of we 
can construct so that L^. = Lj, and 5q,- = S'^. Here we exploit that every job in is at most 5|Lj|, 
and so it is in the Ith class where I < p + 1; similarly, that every job in Lj is in some class I > p; 
and finally, that (A2) facilitates the consistent definition of the size vector coordinates. In the end, 
we can define a double configuration {am-2,Oim-i) consistent with (V4), since the partition fulfils 
(A) or (B). 

Now as long some exists, for which n\ > max{n^, — ^^^^ — — 1}, (see (V3)), wc 'put' tiny 

blocks from P/ to the set P^ (by correcting the 04). It is easy to sec that for every valid magnitude 
we put at most two blocks to m, and the sum of these is at most Ipw-m. 

Clearly, the vertices Vi = {II, i, Oj) exist in Vu, and form an m-path in level II, as easily follows 
from the graph definition. We increased every workload i ^ m by 0{d) ■ OPT, so the theorem 
follows. □ 



5 The deterministic algorithm 

This section describes the deterministic monotone algorithm, in form of two procedures (Sections 5.1 
and 5.2), and the main algorithm Ptas (Section 5.3). In Section 5.4 wc prove the monotonicity of 
the Ptas. We will make use of an arbitrary fixed total order -< over the set of all configurations a, 
such that configurations of smaller total workload \a\ are smaller according to -< . 

5.1 Computing an optimal path in H 

Procedure OptPath (see Figure 1) is a common dynamic programming algorithm that finds an 
m-path of minimum makespan in TCj. However, we do not simply proceed from left to right over the 
m graph layers, but select an optimal path from the first layer to every node in Vj, and similarly, 
an optimal path from layer m to each node in Vu. Finally, we test each vertex in Vu to provide 
a potential switch vertex (i.e., we find optimal paths leading to the switch vertex from both end- 
layers). When the makespan of two prefix (or suffix) paths is the same, we break ties according to 
-< . We choose a switch vertex Vk providing optimum makespan, and of maximum possible k. The 
case k = m — 2 needs careful optimization. Roughly, we choose deterministically by some fixed 
order of the configuration triples (a„i_2, cc^-i, Om), but minimize the makespan on the last three 
machines by redistributing the tiny blocks. The flexibility provided by three machines with tiny 
blocks, facilitates monotone allocation in this degenerate case as well. 

A pseudo-code of OptPath is presented in Figure 1. Observe, that by the definition of M() 
and opt{) values, M{vk) = M{Q). Moreover, since in each case the pointers pred{) and succQ 
determine an incoming, and an outgoing path of minimum makespan, respectively, we can make 
the following observation: 

Observation. For all v G Vu, the value M(v) is the minimum makespan over all m-paths having 
vertex v as switch vertex. Consequently, M(vk) = M{Q) is the minimum makespan over all m-paths 
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Procedure 1 OptPath 

Input: The directed graph Hj. 

Output: The optimal m-path Q = [vi, . . . , Vm) of Tii. 

1. for every double vertex f^_2 = {pLm-2,otm-i) € Vn do 

opt{v!^_2) := msiK{f{vm-2),f{vm-i),fivm)} (whcrc am-1 determines a^, see Section 4) 
M{v') := max{^^, ^}; 
for i = m — 3 downto 1 do 

for every v = {d,i,a) G Vn do 

(i) succ[v) := It;, if opt{w) = inm{opt{y) \ {v,y) € i?}, and among such vertices of minimum 
opto configuration a of vertex w is minimal wrt. -< . 

(ii) opt{y) := max{/(f ), opt(succ(f ))}; 
M(v) := max{^,opt(succ{v))}. 

2. for every v = {d, 1, a) G F/ do opt{v) := /(v); 
for i = 2 to m — 2 do 

for every i; = (d, z, a) G U Vu do 

(i) pred{v) := w E Vj, if opt{w) = imn.{opt{y) \ y G Vf, (y, 'u) G -E}, and among such 
vertices of minimum opt{) the configuration a of vertex u; is minimal wrt. -< . 

(ii) if f G V/ then opt{v) := max{/(f ), opt(j9red(?;))}; 
if u G V/-/ then M(i;) := m.ax{M{v),opt{pred{v))}. 

3. select an optimal switch vertex Vk = {II,k,ak) G Vn, by the following objectives: 

(i) M{vk) = min{M('y) \v eVn}; 

(ii) the layer k is maximum over all v of minimum M{v); 

(iii) if A; = m — 2, then among all double vertices v!^_2 = {am-2, CKm-i) of minimum M{v!^_2) 
(and hidden configuration am), select Vk = v'jn-2 by the following objectives: 

(a) keep the order (A) (B) (cf. Section 4); 

(b) in case of (A), select an (d {Tarr^-2^Ta^_-^UTa^)) (i.e., with a common 
pool of tiny blocks) by some predefined ordering, then minimize the highest finish time 
q£ \am-2\ \am-i\ jomi ^^q^ miuimizc the second highest finish time among them 

Sm-2 ' Sm-l ' Sm ' ° ° 

(by redistributing the tiny jobs); 

(c) in case of (B), select an {am-2,C(m-i,(^m) by some predefined ordering, but so that 

lom-il + \ctm\ is maximized; 

(iv) if k < m — 3 then the configuration is minimal wrt. -< over all v of minimum M(v) 
in layer k; 

4. for i = k — 1 downto 1 do : = pred{vi-^i); 
for i = A; + 1 to m do Vi := succ{vi-i); 

Q := {vi,...,Vk,...,Vm)- 

Figure 1: Procedure OptPath finds an m-path Q of minimum M(Q) in the directed graph. 
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5.2 Constructing the final job allocation 

Once an optimal m-path is found, wc have to allocate the jobs of Pj to the machines. This 
is obvious for jobs that appear individually in some configuration of the path, but we need an 
accurate description of how the tiny jobs are distributed, given the block representation. Procedure 
Partition is detailed in Figure 2. Importantly, depending on whether the switch machine k is 
filled high (above (1 — e/2) times the makespan) or low, it gets filled with tiny jobs below, resp. 
above \ak\- This, again, will play an important role when showing monotonicity. Distributing the 
tiny jobs when A; = m — 2, is a slightly more subtle procedure, operating with the same principle (a 
low machine is filled over \ai\). In general, having a careful look at Partition, one can see that 
the machines never get filled above the makespan of the input path < M(Q). This is trivial 
for machines without tiny jobs, and follows from the definition of finish time with the extra tiny 
block, for other machines. 

Procedure Partition is shown in Figure 2. The next two lemmas characterize properties of 
the output job-partition, that are essential for proving Theorem 4. 

Lemma 4. If Q = {vi, . . . ,Vm) is the input path to procedure Partition, then the output Qi,..., Qm 
is, indeed, a partition of Pj, and the induced allocation {Qi, ■ ■ ■ ,Qm) is canonical with the choice 
Li •= Lcti- 

Proof. The sets cxi are non-intersecting, as follows from the Definitions 8 and 9, and (El). The set 
T distributed for last, contains exactly the (tiny) jobs missing from (Ji=i '^i^ ^o we really have a 
partition of P/. 

We claim that (Lq,. ,S'q,.) is a (^-division of Qi : The sets [Lai, Son \ Tq.) form a (5-division of 
each cij, according to Lemma 3. We show that the tiny jobs allocated to any machine i have size 
at most pwi- Note that W denotes the total size of tiny blocks in IJ/i=i in the ith round of step 
3. Let r denote the total size of jobs tiny for Wi that were allocated (as non-tiny) in |JJ^"l\ oth- (S5) 
implies that if is in the second size vector of a^, then W + t < n\- pwi + pwi. Moreover, (V3) 
implies that • pwi < J2i<x ~ P'^i^ whenever < n\. So, W + t < ^i^x l^'l' there are 
enough tiny jobs in T from the classes I < \ to fulfil (ii) of step 3b. 

Furthermore, (Al) holds by (E2), and (A2) holds because we defined the configurations and 
Scale{) consistent with (A2), and tiny jobs are allocated in increasing order. □ 

Lemma 5. If Q = {vi, . . . , Vm) is the input path, then for the output Qi, . . . , Qm of Partition 
Hb^m <\%i< M(Q) for every machine i. 

Proof. For i < k, ai contains no tiny jobs, and Qi = ai. So, in this case = = f{vi) < M{Q). 

In Partition, the variable Wi stands for the work of tiny blocks assigned to i hy ai. As for 
i = k, Qk is the first set (possibly) containing tiny jobs, so it certainly receives an amount of 
Wk ± pwk tiny jobs, and not more than Wk in case of HiGH-A;. 

It is straightforward to check that if A; < m — 3, then the last machine with tiny blocks (i G 

{m— 1, m}) receives at most Wi, and at least Wi—Qpwi work (due to the estimate n\ — '^"^ 



+3 



pw 

in the hidden configuration am)- The other machines i > k get tiny jobs of total size Wi ± pwi. 
Note that the +pwi overhead was calculated in these machines' finish times f{v), and indirectly in 

the makespan M(Q). 

Finally, in case k = m — 2, only a low machine i £ {m — 2,m, — l,m} may receive more work 
than I Oil; and each machine receives at least Wi — Qpwi work of tiny jobs. □ 
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Procedure 2 Partition 

Input: The job set Pj, and an m-path Q = {vi, . . . , v^) with switch vertex V}~ in the graph Tii. 
Output: A partition Qi, (325 • • • , Qm of the set P/. 
CaseLow-fc: |afc|/sfc < (1 - e/2) • M(Q) 
Case HiGH-fc : \ak\/sk > (1 - e/2) • M(Q). 

1. for i = 1 to m do 

Qi '■— 0!i', 

2. let T = {ti,t2,ts, . . . , t^} = Pi\ Ui^i Qij so that the jobs tj are in increasing order, and this 
order corresponds to the order of jobs in each class. 

3a. if /c = m — 2, then start with an allocation of tiny jobs to {m — 2,m — l,m} (in the given 
order) s. t. each machine i gets at most |Tq. | amount of tiny jobs (this is doable, because the 
total number of tiny blocks is overestimated by 3 blocks in am) 

let M = max{^^, ^}; 

call i G {m — 2, m — 1, m} low, if |a.j|/s.; < (1 — 2e/3) • M, and high if \ai\/si > (1 — e/2) • M; 
Correct the partition of tiny jobs (with keeping the job order) so that 

(i) if there is one low machine i, and two high machines, then i receives at least \ai\ work; 

(ii) if there are two non-high machines, then both receive at least 6pw work of tiny jobs. 

3b. if A; < m - 3, then let = and r = 0; 
for i = A; to m — 1 do 

given ai = {w, fj,, n°, n^) and A = log pw, let Wi := {n\ — n^) • pw; 

(i) W:=W + Wi; 

(ii) if HiGH-A; then let u be the maximum index in T so that ^2]=! tj ^ ^\ 
if Low-A; then let u be the minimum index in T so that YTi=\ — 

(iii) Qi := Qi U {t^+i, tr+2, ■ ■ ■ , iu}; 

(iv) r := u. 

Qm Qm U {iy-i-i, • • • ) ^/i}- 

Figure 2: Procedure Partition allocates the jobs based on path Q output by OptPath. 
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The next lemma is a descriptive characterization of the final output allocation Pi, ... , to 
be readily used in the monotonicity proof. 

Lemma 6. Let Pj denote the set of input jobs and s he the vector of rounded speeds. Let Q = 
{vi, . . . , Vm) be the path output by OptPath with configurations (ai, . . . , a^) in the vertices, and 
Vk = {IL, k, ak) be the switch vertex of the path. If Pi, ■ ■ ■ , Pm is the final partition output by Ptas, 
then 

(a) for i < k, P^ = = ai. 
Moreover, if k < m — 3, then 

(b) fori>k, ^ > (1-6(5) •M(Q); 

(c) for k either Pk = Qk or (b) holds. 

Proof. Point (a) is obvious, since 5*^^ = for every vertex vi ^Vi, and by step 1. of Partition, 
Lai = Cii = Qi- Moreover, by (E2) Lq. has the ith smallest weight, so Qi = Pj. 

Let M = M{Q). For i > k we first claim that \ai\ > (1 — 26) ■ M ■ Si. Assuming the contrary, 
\ai\ < (1 — 25) • M • Si, we could put a job p G Sa^ to machine i. Since by Proposition 3, p < 
5|Lq,^| < S\ai\ holds, this would increase |Q!j| to at most \a'-\ < (1 + < (1 — (5) • M • Sj. The 

same would hold if i increased its number of blocks by one. Notice that the workload of other 
machines also changes if they have jobs from the class of p (because we defined H consistent with 
(A2)), however now they just get smaller jobs of the same class. Applying inequality (1), we obtain 
ffy') < K\±p^ < KKi + 5) < 

1 Si Si 

We decreased \ak\ (or even found a valid path with switch vertex k + 1 if no small jobs remained 
on k), without having increased the makespan, so Q was not optimal. Now Lemma 5 and inequality 
(1) yields \Qi\ > \ai\ — 6pw > \ai\{l — 46) > (1 — QS) ■ M ■ Si. Since this holds for every i > k, li 
also holds for the ordered partition, which proves (b). 

In order to see (c), notice that (a) implies Pk 7^ Qi for any i < k. For i > k we showed above 
that \Qi\ > (1 — 65) • M • Sj. If the same holds for i = k, then (b) holds for i = k as well; otherwise 
\Qk\ = mmi>k\Qi\, so Pk = Qk- □ 

Corollary 1. In step 5. of Ptas, the sets Qi are permuted only among machines i > k of equal 
rounded speed. Consequently, \Pi\/si < M(Q) for all i. 

Proof. Assume that k < m — 3, and k < h < i. If Si > Sh then by Lemma 6 we have that 
\Qi\ > (1 - 6(5) • M(Q) • Si > (1 - 6(5) • M(Q) • (1 + e) • > M{Q) ■ sn > \QhV Now let A; = m - 2, 
h,i £ {m — 2,m — l,m}, and Sh < Sj. Since ai is increasing, Qh > Qi could happen only if h 
receives tiny jobs, and has finish time about a factor of (1 + e) higher than i. However, then either 
the highest finish time, or the second highest finish time among the machines {m — 2,m — 1, m} 
would not be minimized by redistributing the tiny jobs, as required by OptPath. □ 

Corollary 1, and Theorem 2 imply following the upper bound: 
Theorem 3 For arbitrary input I, and any given < e < 1, the deterministic algorithm Ptas 
outputs a (1 + 3e)- approximate optimal allocation in time Poly{n,m). 

Proof. By Theorem 2, for (5 ^ e the optimal path Q in Hi has makespan M{Q) < {l + e)OPT, and 

by Corollary 1 this remains an upper bound for the makespan of the output. Since Ptas rounds 
the input speeds to integral powers of (1 + e), we obtain an overall approximation factor of at most 
(l + e)2 < (l + 3e). 
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Algorithm 3 Ptas 

Input: machine speeds ai < C72 < . . . < am, and job set Pi = {pi,P2, ■ ■ ■ ,Pn}, desired precision e. 
Output: A partition Pi,P2,. . . , Pm of Pj. 

1. for each i G [1, m], round the speed CTj up to the nearest power of (1 + e); 

2. based on the jobs Pj, rounded speeds si < S2 < • . . < Sm, and an appropriate (5 <C e, construct 
the graph Hr, 

3. run Procedure OptPath in order to obtain the optimal m-path Q= {vi,. . . ,Vm)', 

4. from Q compute the partition Qi, Q2, • • • , Qm by Procedure Partition; 

5. let Pi, P2, • • • , Pm be the sets of {Qi, Q2, • • • , Qm}, sorted by increasing order of weight \Qi\; 

output Pl,P2, ■ ■ ■,Pm- 



Figure 3: The deterministic monotone Ptas. 

In order to prove the running time bound, wc sliow that for constant e, step 2. of Ptas can be 
computed efficiently. The number of graph vertices | is 0{m ■ A), where A denotes the number of 
different configurations. Every configuration (including the double configurations of triple length) 
is determined by C'(log(i_|_5) 1/p) = 0{{l/6) ■ log 1/5) coordinates, each of which (including fj,) may 
take at most n + 1 different values, so |^| = n^^(^/^y^°s'i-/S) _ Finally, for any v,v' E V deciding 
whether {v, v') G E, takes time linear in the number of these coordinates. Thus for fixed 5, step 2. 
is poly{n,m), and steps 3., 4. and 5. are obviously polynomial, which completes the proof. □ 

5.3 The monotone PTAS 

The monotone Ptas is presented in Figure 3. A substantial property of the output is that workloads 
Qi without small jobs do not get permuted in step 5. of Ptas. This is due to the fact that the 
sets Li of large jobs are increasing by (E2) (rcsp. that a-j is increasing in case of (A)). On the 
other hand, machines having small jobs, except for the switch machine, have finish time close to 
the makespan M(Q) (resp. finish time of small difference in (A)). As a consequence, we obtain 
that in step 5. the sets Qi are permuted only among machines i > k oi equal rounded speed Sj. 
Therefore, even for the permuted workloads, |Pi|/si < M(Q) holds for all i. This, in turn, together 
with Theorem 2 implies the following: 

Theorem 3. For arbitrary input I, and any given < e < 1, the deterministic algorithm Ptas 
outputs a (1 + 3e)- approximate optimal allocation in time Poly{n,m). 

5.4 Monotonicity 

Our main result is the following theorem. 
Theorem 4. Algorithm Ptas is monotone. 

Proof. Assume that machine i alone decreased its speed cjj to a- in the input. If the vector of 
rounded speeds (si, S2, . . . , Sm) remains the same, then the deterministic Ptas outputs the same 
allocation, and i receives the same, or smaller workload, since the output workloads Pi, P2, • • • , Pm 
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are in increasing order. Assuming that the rounded speed Sj decreased as well, it is enough to 
consider the special case when i is the first (smallest index) machine of rounded speed Sj = (1 + e) 
in input I{P, s), and after reducing its speed, it becomes the last (highest index) machine of rounded 
speed 1 in input I'{P,s'). Since the workloads in the final allocation P\ , ■ ■ ■ , Pm 

are ordered, this 

implies monotonicity for every 'one-step' speed change (like (1 + e) ^ 1). Monotonicity in general 
can then be obtained by applying such a step repeatedly. Note that for both inputs the algorithm 
constructs the same graph, independently of the speed vector. We assume that with inputs /, and 
/' OptPath outputs Q = (f 1, . . . , Vm), and Q' = (t;^, . . . , v'^), where the switch vertices have index 
k and k\ respectively. Finish time, makespan, etc. wrt. the new speed vector s' are denoted by 
/'(),M'() etc. We prove that \Pi\ > 

We start with a simple observation. Since we increased a machine speed, it follows from the 
definition of makespan that for any path TZ = {vi,V2, ■ ■ ■ ,Vr) in layer Vj, or in layer Vn, and for 
any m-path, M'{7V} > MijV). Similarly, for any vertex v, opt'{v) > opt{v), and for any v G Vu 
M'{v) > M{v) (cf. Procedure OptPath). Obviously, also the optimum makespan over all m-paths 
could not decrease. We elaborate on the subtle case of A; = m — 2 in a separate lemma; in what 
follows, we assume A; < m — 3. 

CASE 1: M\Q) > M{Q) 

In this case machine i with the new rounded speed = 1, becomes a bottleneck in path Q. 
That is, M'iQ) = f'ivi) = l^^iMM, 

If i < k, then = Pj, so the machine received exactly |aj| work with speed Si, and now Q is a 
path with makespan \ai\, so by Corollary 1, and by the optimality of Q' we have = \Pl\/s[ < 
M'{Q') < M'iQ) = \ai\ = \Pi\. 

Recall that = (1 + e). Let us introduce the notation 5 =^ (1 — 66) ■ M{Q) for the lower bound 
of Lemma 6. Assume now that i > k and |Pi|/(l + e) > B. 

Due to (E2), for the job partition Qi, . . . , Qm (before ordering the sets by size), it holds that 
\Qh\ > l-^oil fo'^ every h > i. Therefore, for the iih largest set Pj, we have \Pi\ > \La^\, and so 
|Pi| >max{|L«J,(l+e)-B}. 

We modify the path Q and construct a new path Q" by 'putting' small jobs from So^ (of machine 
i) onto machine i + 1, until the moved jobs have total weight of at least 75 • (1 + e) • M(Q), or Sc^ 
becomes empty. For the new finish time (using (1) we have f'{v") < max{|Laj|, (1 +e) • M(Q)(1 — 
75) + pw} < max{|L„. I, (1 + e) • B} < \Pi\. If i = m, then we put only tiny blocks of the common 
magnitude Wm-i onto m — 1, and use \a„i\ instead of \Lcn \ in the calculation. 

We claim that with speed s'^ = 1 machine i is a bottleneck machine in both paths Q and 
Q" . For i > fc, it follows from the optimality of Q (as shown in the proof of Lemma 6) that 
> (1 + e) • M(Q)(1 - 26). (Note that, as a consequence, M'(Q) = M{Q) is possible only 
if i < k.) For i = k it follows from Partition and by assumption on |Pj| that \ai\ > \Qi\ > 
(1 + e) • M(Q)(1 — 66). After removing the small jobs, in the new path f'{v") = M'{Q"), because 
/'«) > (1 + e) • M(Q)(1 - 136) > M(Q). Thus, f'{v") is an upper bound on the new optimal 
path-makespan, and it is less than |Pj|. 

By Lemma 6, it remains to consider the case i = k, and Pj = Qi. Given M'(Q) > M(Q), we 
have M'{Q') < M'{Q) = \ai\ as an upper bound on |P/|. Assuming that for i = k Low-fc holds with 
speed Si, \Pi\ = \Qi\ > \ai\ by Partition 3b, and we are done. Assuming HiGH-A:, \Pi\ = \Qi\ > 
max{\ai\,\ai\-pwi}. On the other hand, M'{Q) = \ai\ > (1 - e/2)M(Q) ■ (1 + e) > (l + e/3)M(Q). 
By putting one tiny block onto the next machine (if there are any), we still obtain a path Q", with 
makespan M'{Q") = \a"\ = max{|aj|, \ai\ — pwi} > M{Q), and we are done. 

CASE 2. M'{Q) = M(Q), and i < k. 
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If M'(Q) = M(Q), and Q = Q', then the output of Partition ean only be different ifi = k. 
Sinee the makespan did not change, and Sk decreased, the change is from Low-fc to HlGH-fc, and 
machine i = k receives less work with s^, by Partition 3b. If, on the other hand, Partition 
outputs the same partition, then every machine gets the same workload. 

Suppose M'{Q) = M{Q), but Q / Q'. Since Q has minimum makespan, M'(Q') = M'{Q). We 
claim that also k' = k. Otherwise Q' would have been better than Q for input s as well, because 
M{Q') < M'(Q') = M(Q) and k' > k. Similarly, also = v'j^, otherwise a' ^ a would hold, and 
Q' would have been better for input s as well. 

Now, since Q / Q', a maximum h < k exists so that 7^ u^. This means that pred{vh+i) 7^ 
pred! {vh^i). If v'l^ was preferred in Q' because a'^^ -< ah-, then opt{vfi) < opt^v'j^) < opt'{v'f^) < 
opt'{vh)- The first inequality holds, otherwise = pred{vh+i) would have been the choice. The 
second holds for every vertex. The third holds, otherwise Vh = pred' {vh+ij would have been 
the choice of OptPath. Similarly, if v'^^ was preferred in Q! because opt'{v'f^ < opt'{vh), then 
opt{vh) < opt{v'i^ < opt'(v'jJ < opt'(vii). In both cases we obtained opt{vh) < opt'{vh)- Recall that 
opt{vh) is the optimum makespan over all paths leading to Vh from layer 1. This could strictly 
increase only ii i < h, and i with workload CKj = Pi and speed s'^ = 1 became a bottleneck machine 
in {vi,V2, . . .,Vh). Therefore, l^-'l < opt'{v'fJ < opt'{vh) < ¥ = WiV 

i 

Lemma 7. // on input I{P, s), for the output path Q of OptPath the switch index is k = m — 2, 
then \Pi\ > \Pl\. 

Proof. For i < k the proof is exactly the same as in CASES 1. and 2. of the theorem, since the 
structure of the output solution on machines i > k does not affect that argument. In the rest of 
the proof we assume that i > A; = m — 2. In this case, for i < k clearly Qi = Li is non-decreasing. 
Furthermore we make use of the following: 

Claim 1. Step 3a. of Partition can be realized so that |(5m-2| < |Qm-i| < \Qm\ holds. 

The claim implies Qi = Pi for all i € [l,Tn], so no permutation in step 5. of Ptas takes place, 
which simplifies the monotonicity argment below. 

To see the claim, observe that for every path where k = 171 — 2, \ai\ and \ai\ are non-decreasing 
(cf (A) and (B) in Section 4). It is not hard to see that. Partition can allocate the tiny jobs in 
increasing order to the machines so that \Qi\ is also increasing, and Lemma 5 still holds. (Here we 
exploit that the number of machines with tiny jobs is constant.) Now consider 3a. of Partition. If 
machine Qi is increased (corrected) because of (i) , then i still gets much less work than any higher 
index machines; if Qi is increased because of (ii), then i has about 6 tiny blocks whereas the other 
non-high machine gets at least 12 tiny blocks (and the bottleneck machine gets no blocks). Thus, 
no higher index machine can get less work than i, and the proof of the claim is completed. 

We do not give a detailed proof for the cases (A)(i) and (B)(i). In both cases all tiny jobs are 
on the last two machines. These jobs can be allocated to machines m — l,m in increasing order, 
so that the makespan on (m — l,m) is optimized, and this optimized makespan can be computed 
exactly already during the path optimization. It can obviously be assumed in both (A) and (B) 
that |q;to_2| < |am-i| < |"m|- The proof is therefore analogous to the proof of case i < k. 

Now suppose that i>k = m-2, and for path Q (A)(h) holds. Let M = max{l|^, ^}. 
Assume first, that machine i with speed Si was a non-low machine in 3a. of Partition. Then, with 
speed ,s' = 1 machine i becomes a bottleneck in the subpath (vm-2,Vm-i,Vm), and maybe even in 
the path Q. Moreover, by <C e, machine i is still a bottleneck in a modified (sub)path Q", where 
we put 7 tiny blocks, or all tiny blocks from i to another one of the last three machines. That 



22 



is, the makespan of Q" (resp. the local makespan on (m — 2,m — l,m)) is max{|ai|, \ai\ — 7pwi}, 
which is an upper bound on |P/|, and a lower bound on \Pi\, by Lemma 5. 
Now assume that i was a low machine in 3a. of Partition. 

If there was another non-high machine i' then, by the optimization rules 3. (iii) of OptPath, 
only i and i' have tiny blocks in Q. Having changed the speed to s'^, we construct a path Q" by 
putting tiny blocks (when necessary) from i to i' so that their maximum finish time is minimized. 
If, with the optimized tiny blocks, the local makespan remains M then Q' = Q" is the new output 
solution. (Any path preferred to Q" would have been preferred with speed Sj as well.) Obviously, 
in Q" i gets no more tiny blocks than in Q, and in PARTITION i receives a subset of the tiny jobs 
that it received with speed Sj. If the local makespan increases then i becomes a (local) bottleneck 
so it has at most 6 blocks in Q" (further blocks could be put on i'). Thus, \a.i\ + 6pwi is an upper 
bound on the new local makespan and also on whereas, by Partition 3a (ii) it is a lower 
bound on |Pj|. 

Suppose that i was a low machine, and the other two were both high machines. It easily 
follows from 3 (iii) of OptPath that i has nearly all tiny blocks with speed sf, whereas by 3a 
(i) of Partition |Pj| > \ai\ holds. Consider now the same path Q with speed s^. Hi becomes 
a local bottleneck (i.e., among {m — 2,m — l,m}) then \ai\ is an upper bound on the optimal 
(local) makespan of Q' , so that |P/| < \ai\, and we are done. If i has the second highest finish 
time among {m — 2,m — l,m}, then the output path remains basically the same (by 3 (iii) of 
OptPath), possibly optimizing the second finish time by removing blocks from i. i is not a low 
machine anymore, and |P/| < \ai\. Hi has still the lowest finish time, then the output path is the 
same, and in PARTITION i gets the same set, or a subset of his previous jobs in case he becomes a 
non-low machine. 

Finally, we turn to the case when i > k = m — 2, and for path Q (B) (ii) holds. We claim that 
the optimality of Q implies that machines m — 1 and m have finish time at least (1 — 6S)M{Q) (cf. 
Lemma 6). Based on this, one can easily verify that the allocation of PARTITION is essentially the 
same as in the case k < m — 3, and the monotonicity proof is analogous. 

In the rest of the proof, we show that the claim holds. Recall that in case (B) (ii), OptPath 
maximizes |Q:m-i| + |Q!m|- As long as machine m — 3 has at least one small job, the same argument 
as in Lemma 6 can be used: if m — 1 or m are not filled enough, then we could shift a small job 
from m — 2 to these machines, increasing |Q!m_i| + \am\, without increasing M{Q), a contradiction. 

Assume that m — 2 has no small jobs, that is, am-2 = Lm-2- We show that the whole job 
set has the size of at most that of a tiny job for Wm-i- By the condition (B), p^Wm-i > 

Wjn-2- Since Wm-2 is the maximum job size in Z/j^,— 2; and by property (D2) of (5-divisions, we have 
Wm-2 > ^(i^s)^^ > p\Lm-2\- The latter inequality follows from p < 6/3 in case 1 + 6 < \/3. Putting 
it together, we obtain pP'Wra—x > p\L^—2 

I, so I -Lm-2 1 is tiny for Wm-i- We define a path Q" as 
follows. We put all the jobs of -Lm-2 to m — 1, and shift the jobs of the corresponding classes to m, 
when necessary (i.e., when m — 1 was filled, but m not). Also, shift every workload a, = Li to the 
next machine for machines i < m — 3. Finally, note that the new partition is canonical: the jobs in 
Lm-2 were the last large jobs in their class, now they become the first small jobs, so (A2) remains 
valid. This means that we found a path better than the optimum Q, a contradiction. □ 

□ 
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